Actor and Observer: Joint Modeling of First and Third-Person Videos
نویسندگان
چکیده
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain. ∗Work was done while Gunnar was at Inria. †Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.
منابع مشابه
Supplementary Material: Actor and Observer: Joint Modeling of First and Third-Person Videos
Gunnar A. Sigurdsson 1∗ Abhinav Gupta 1 Cordelia Schmid 2 Ali Farhadi 3 Karteek Alahari 2 Carnegie Mellon University 2Inria† Allen Institute for Artificial Intelligence github.com/gsig/actor-observer Supplementary Material This supplementary material contains the following. 1. Details of the implementations of the new layers 2. Details of ActorObserverNet 3. Full derivation of the loss with res...
متن کاملAction Understanding with Multiple Classes of Actors
Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor—a human adult, ignoring the diversity of actions performed by other actors. To overcome this narrow viewpoint, our paper marks the first effort in the computer vision community to jointly consider algorithmic understanding of various types of actors undergoing vario...
متن کاملJoint Person Segmentation and Identification in Synchronized First- and Third-person Videos
In a world in which cameras are becoming more and more pervasive, scenes in public spaces are often captured from multiple perspectives by diverse types of cameras, including surveillance and wearable cameras. An important problem is how to organize these heterogeneous collections of videos by finding connections between them, such as identifying common correspondences between people both appea...
متن کاملInferring actor communities from videos
In recent years there has been a growing interest in inferring social relations amongst actors in a video using audiovisual features, co-appearance features or both. The discovered relations between actors have been used for identifying leading roles, detecting rival communities in a movie plot etc. In this paper we propose an unsupervised method which uses the video’s transcript and closed cap...
متن کاملA New Method for Characterization of Biological Particles in Microscopic Videos: Hypothesis Testing Based on a Combination of Stochastic Modeling and Graph Theory
Introduction Studying motility of biological objects is an important parameter in many biomedical processes. Therefore, automated analyzing methods via microscopic videos are becoming an important step in recent researches. Materials and Methods In the proposed method of this article, a hypothesis testing function is defined to separate biological particles from artifact and noise in captured v...
متن کامل